Workshop #1

AI Accelerator: Device and Technology Perspective

"A Device Perspective of AI Accelerator End-to-End Co-design"

Prof. Philip WONG (Stanford)

Abstract

Performing AI tasks directly on resource-constrained edge devices calls for unprecedented energy-efficiency of edge AI hardware. To deliver such energy-efficiency, AI hardware must have enough on-chip memory capacity to prevent frequent off-chip memory access from dominating energy consumption. Emerging on-chip memory technologies such as resistive-switching RAM (RRAM) can potentially offer orders of magnitude higher density than conventional SRAM. They can not only be used to store larger AI models on-chip, but also directly perform AI inference within memory, further improving energy-efficiency.

The characteristics of on-chip RRAM devices largely impact accelerator performance. I will show this through two case studies performed in my research group. In the first study, we examined the impact of RRAM conductance relaxation (broadening of conductance distribution over time) on inference accuracy for an RRAM Compute-In-Memory accelerator. We performed statistical characterization of conductance relaxation on a fabricated 65K-RRAM-array. Based on our findings, we developed techniques to mitigate its impact on inference accuracy.

In the second study, we developed an MLC (multi-level cell) RRAM device engineering concept – step RRAM – that can reduce the write/verify cycles needed to program MLC RRAM. We further built models to explore the design space of RRAM macro. We explored overall step RRAM macro density, energy efficiency and read latency. This opens a path toward practical MLC RRAM applications.

Prof. Philip WONG

"Memristor-based Echo State Networks"

Prof. Zhongrui WANG (HKU)

Abstract

The unprecedent development of Internet of Things (IoT) results in the explosion of data generated by smart edge devices, leading to a surge of interest in edge AI. This imposes a big challenge to conventional digital hardware because of physically separated memory and processing units and the transistor scaling limit. Memristors are deemed a solution for efficient and portable deep learning. However, their ionic resistive switching incurs large programming stochasticity and energy, which is detrimental to their advantages in edge AI. In this talk, we will discuss how the randomized algorithms in machine learning can enable novel hardware-software codesigns to address the aforementioned challenges. Such codesigns not only leverage the highly parallel and efficient in-memory computing with memristors, but also turn the disadvantageous stochasticity of memristors into an advantage. We will first introduce memristor-based random convolutional echo state network for spatiotemporal signal learning. Second, we will discuss how the random memristor arrays are leveraged for graph learning. Third, we will show such graph embedding, when combined with a memristive associative memory, meets the demand for few-shot graph learning.

References

Wang, Z., Wu, H., Burr, G.W. et al. Resistive switching materials for information processing. Nat Rev Mater 5, 173–195 (2020)
Wang, Shaocong et al. “Echo state graph neural networks with analogue random resistor arrays.” ArXiv abs/2112.15270 (2021)

Prof. Zhongrui WANG

"Defect Tolerant Memristive Neural Networks"

Prof. Can LI (HKU)

Abstract

Analog computing pre-dates digital computing but has been long forgotten due to the fast development of the latter. While very powerful, digital computing is highly inefficient at performing perception-related tasks. The problem becomes increasingly significant with the rapid growth of artificial intelligence and transistor scaling approaching their physical (and economic) limit. Since memristors were experimentally shown in 2008 by Hewlett Packard Labs, researchers have extensively explored their ability to store and process information in the analog domain.

Despite great promises in the laboratory environment, memristor crossbar, non-volatile resistive analog memory based on matrix multiplication accelerators, has remained a high risk, particularly in immature device-led unexpected analog computing error. This talk will present our recent progress in tackling those challenges and making the computing system tolerant of device defects. The first method is the in-situ training directly on the crossbar to self-adapt defects during the training process. The second method is a novel analog error-correcting code that detects and corrects error outliers that exceed a predefined threshold. Finally, we build an experimentally-validated array-level crossbar model and train the neural networks in our model before they are deployed in memristor hardware to make the inference operation more tolerant of common device errors. We expect the schemes introduced here to make analog computing more feasible.

References

C. Li, et al, “Efficient and self-adaptive in-situ learning in multilayer memristor neural networks”, Nature Communications, 9, 1 (2018)
C. Li, et al, “Analog error correcting codes for defect tolerant matrix multiplication in crossbars”, IEDM 2020
R. Mao, et al, “Experimentally-Validated Crossbar Model for Defect-Aware Training of Neural Networks”, IEEE TCAS2, (early access) 2022

Prof. Can LI

"Device Variation-Aware Quantization for In-Memory Computing with SOT-MRAM"

Prof. Qiming SHAO (HKUST)

Abstract

Hardware-accelerated artificial intelligence is pushing both the algorithm and hardware to their design limits. To further increase performance, software-hardware co-design is becoming increasingly important. One of the biggest restrictions for analog-based in-memory computing (IMC) is the device variation and noise in the memory array. Spin-orbit torque magnetic random access memory (SOT-MRAM) is an emerging non-volatile memory (eNVM) with fast writing/reading speed (~10ns), low power, etc. Similar to other eNVMs, MRAM also suffers from variations and noises in the analog IMC. Luckily, the cycle-to-cycle variation for MRAM is very small. With the proper method, we can overcome the device-to-device variation and thus greatly reduce the total variation during computing. The quantized weights of neuron networks are represented by the conductance of the memory devices for IMC. Instead of forcing the networks and the memory devices to match each other by the network robustness, they can be adaptively tuned to cooperate. We used a device variation-aware quantization method to quantize the weights according to the conductance matrix of the corresponding memory device array. To reduce the extra effort for this quantization scheme, we first train and quantize the network in a conventional variation-aware way. After that, we re-quantize the weights with one-shot tuning. Then we can have a variation-free model. Compared with traditional hybrid off-chip and on-chip training, our device variation-aware quantization method requires only one-shot tuning, which not only greatly reduces on-chip programming energy and latency, but also extends the lifetime of the memristor array for the inference stage.

References

Qiming Shao, Zhongrui Wang & J. Joshua Yang, Efficient AI with MRAM, Nature Electronics 5, 67–68 (2022)
Seungchul Jung, et al., A crossbar array of magnetoresistive memory devices for in-memory computing, Nature 601, 211–216 (2022)
Peng Yao, et al., Fully hardware-implemented memristor convolutional neural network, Nature 577, 641–646 (2020)
Jonas Doevenspeck et al., SOT-MRAM based analog in-memory computing for DNN inference, 2020 IEEE Symposium on VLSI Technology

Prof. Qiming SHAO

"Resistive Memory: From Technology to Device Model"

Prof. Mansun CHAN (HKUST)

Abstract

The recent development in neuromorphic computing has generated a strong interest in resistive memory. Many new memory technologies have been developed around the world recently. However, technology deployment is relatively slow. One major issue is the lack of a reliable simulation strategy and a mature device model for circuit simulations. It is because the conventional approach to simulate logic circuit is not memory friendly and memory models have to work around a number of fundamental limitations before they can be used in circuit simulator. In this presentation, I will discuss the simulation infrastructure required for neuromorphic computing and the approach to develop memory models that can handle detail temporal variations with respect to in-coming signals and data.

References

https://www.youtube.com/channel/UCQKeknQioXvHk1wZZB-dliw

Prof. Mansun CHAN

Back